Input dependent misclassification costs for cost-sensitive classifiers
نویسندگان
چکیده
In data mining and in classification specifically, cost issues have been undervalued for a long time, although they are of crucial importance in real-world applications. Recently, however, cost issues have received growing attention, see for example [1,2,3]. Cost-sensitive classifiers are usually based on the assumption of constant misclassification costs between given classes, that is, the cost incurred when an object of class j is erroneously classified as belonging to class i. In many domains, the same type of error may have differing costs due to particular characteristics of objects to be classified. For example, loss caused by misclassifying credit card abuse as normal usage is dependent on the amount of uncollectible credit involved. In this paper, we extend the concept of misclassification costs to include the influence of the input data to be classified. Instead of a fixed misclassification cost matrix, we now have a misclassification cost matrix of functions, separately evaluated for each object to be classified. We formulate the conditional risk for this new approach and relate it to the fixed misclassification cost case. As an illustration, experiments in the telecommunications fraud domain are used, where the costs are naturally data-dependent due to the connection-based nature of telephone tariffs. Posterior probabilities from a hidden Markov model are used in classification, although the described cost model is applicable with other methods such as neural networks or probabilistic networks.
منابع مشابه
Thresholding for Making Classifiers Cost-sensitive
In this paper we propose a very simple, yet general and effective method to make any cost-insensitive classifiers (that can produce probability estimates) cost-sensitive. The method, called Thresholding, selects a proper threshold from training instances according to the misclassification cost. Similar to other cost-sensitive meta-learning methods, Thresholding can convert any existing (and fut...
متن کاملCost-Sensitive Learning with Neural Networks
In the usual setting of Machine Learning, classifiers are typically evaluated by estimating their error rate (or equivalently, the classification accuracy) on the test data. However, this makes sense only if all errors have equal (uniform) costs. When the costs of errors differ between each other, the classifiers should be evaluated by comparing the total costs of the errors. Classifiers are ty...
متن کاملCost-Sensitive Learning in Medicine
This chapter introduces cost-sensitive learning and its importance in medicine. Health managers and clinicians often need models that try to minimize several types of costs associated with healthcare, including attribute costs (e.g. the cost of a specific diagnostic test) and misclassification costs (e.g. the cost of a false negative test). In fact, as in other professional areas, both diagnost...
متن کاملA New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate
Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...
متن کاملClassifier Learning for Imbalanced Data with Varying Misclassification Costs A Comparison of kNN, SVM and Decision Tree Learning
This thesis theoretically discusses the abilities of three commonly used classifier learning methods and optimization techniques to copewith characteristics of real-world classification problems, more specifically varying misclassification costs, imbalanced data sets and varying degrees of hardness of class boundaries. From these discussions a universally applicable optimization framework is de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000